Diwan et al. BMC Genomics 2014, 15:142 
http://www.biomedcentral.com/1471 -21 64/1 5/1 42 



Genomics 



METHODOLOGY ARTICLE Open Access 



Systematic genome sequence differences among 
leaf cells within individual trees 

Deepti Diwan 1t , Shun Komazaki 11 , Miho Suzuki 1 , Naoto Nemoto 1 , Takuyo Aita 3 , Akiko Satake 2 and 
Koichi Nishigaki 1 * 



Abstract 

Background: Even in the age of next-generation sequencing (NGS), it has been unclear whether or not cells within 
a single organism have systematically distinctive genomes. Resolving this question, one of the most basic biological 
problems associated with DNA mutation rates, can assist efforts to elucidate essential mechanisms of cancer. 

Results: Using genome profiling (GP), we detected considerable systematic variation in genome sequences among 
cells in individual woody plants. The degree of genome sequence difference (genomic distance) varied 
systematically from the bottom to the top of the plant, such that the greatest divergence was observed between 
leaf genomes from uppermost branches and the remainder of the tree. This systematic variation was observed 
within both Yoshino cherry and Japanese beech trees. 

Conclusions: As measured by GP, the genomic distance between two cells within an individual organism was 
non-negligible, and was correlated with physical distance (i.e., branch-to-branch distance). This phenomenon was 
assumed to be the result of accumulation of mutations from each cell division, implying that the degree of 
divergence is proportional to the number of generations separating the two cells. 
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Background 

At the beginning of the 21st century, genome sequences 
of two closely related species, human and chimpanzee, 
were found to differ by approximately 4% based on con- 
ventional genome sequencing technology [1]. With the 
advent of next-generation sequencing (NGS), it has been 
established that each person has a unique genome [2]. 
Within a single organism, genome sequences may be 
epigenetically different between cells, and sporadic dif- 
ferences are sometimes present between cells from dif- 
ferent organs [3]. It is not clear, however, whether each 
cell within an individual organism possesses a systemat- 
ically different genome sequence. 

Various breakthroughs have been steadily reshaping 
our understanding of genomes. These advances include 
accumulating analyses of whole-genome sequences of 
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individuals [4,5], identification of various non-coding 
RNAs [6], discovery of the existence of highly repeated 
sequences [7], and recognition of frequent recombin- 
ation of genome structures [8,9]. Recently, an intensive 
study on the fate of cancerous cells by NGS revealed 
that lineages of such cells are vigorously mutating [10]. 
Advanced papers on this topic have subsequently ap- 
peared [11,12]. 

On the other hand, genome sequence differences 
have been examined by the copy number variation ana- 
lysis between normal cells within a single organism 
[13-15], which informed us of frequent occurrence of 
mutation in the form of replication slippage at particu- 
lar genomic loci. In a sense, this is a filtered (i.e., re- 
stricted to the tandem repeat sequences) observation of 
genome alterations. More wide observation of normal 
genomic DNA is just beginning as can be seen in the 
recent report [3,16]. Our study is the first to detect sys- 
tematic genome sequence differences among cells in 
single organisms, i.e., within individuals of two woody 
plant species (Figure IE). 



© 2014 Diwan et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative 
Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain 
Dedication waiver (http://creativecommons.Org/publicdomain/zero/1.0/) applies to the data made available in this article, 
unless otherwise stated. 



Diwan et al. BMC Genomics 2014, 15:142 
http://www.biomedcentral.com/1471 -21 64/1 5/1 42 



Page 2 of 12 




Figure 1 Clustering of Yoshino cherry tree leaves. (A-C) Dendrograms resulting from Ward's cluster analysis of genomic distances of leaves 

from five tree branches of a Yoshino cherry tree. Each analysis used genomic distances calculated from one of three independent GP 

experimental trials using the same leaves. Genomic distances are displayed on dendrogram branches. (D) Dendrogram obtained from global 

clustering of leaves from the Yoshino cherry tree. Genomic distances analyzed were calculated from averaged spiddos data obtained from three 

independent GP experimental trials using the same leaves (for details see in Additional file 4: Table S2). (E) Yoshino cherry tree from which young 

leaves were sampled in April 2010, after the flowering season. The tree was located on the campus of Saitama University. 
\ . J 



There has been a hypothesis (genetic mosaicism hy- 
pothesis) that long statured plants accumulate spontan- 
eous mutations that expanded among modules (shoots, 
branches, leaf etc.) and become genetically mosaic as 
they grow [17]. This hypothesis is explicitly based on the 
idea of finite spontaneous mutation rate. That is, DNA 
replication proceeds with limited accuracy, i.e., 10" 6 to 
10" 9 errors/base/replication [18] and thus every repli- 
cated genome sequence (e.g., the 3 x 10 9 -bp sequence of 
the human genome) naturally differs from its parental 
genome. In general, these differences were too small to 
be directly detected, as they were often below the detec- 
tion limit of sequencing analysis. Consequently, muta- 
tion rate has been conventionally estimated indirectly 
based on phenotypic changes, such as variation in 



antibiotic resistance. This situation has been changed by 
the advent of the NGS (next generation sequencing), en- 
abling the detection of low rate of mutations [19]. How- 
ever, its application is limited mainly due to high cost 
and difficulty in data processing [20] . 

Fortuitously, Genome Profiling (GP) (Figure 2), an easily 
operable and informative genome analysis method [21-28] 
is sufficiently competent to detect differences between 
closely related cells [27,28]. Compared with conventional 
sequencing approaches, GP involves two unique proce- 
dures (Figure 2): i) collection of DNA fragments from 
genomic DNA by random PCR [29] and ii) acquisition of 
DNA sequence information using micro-temperature gra- 
dient gel electrophoresis (\xTGGE) by separating DNA 
fragments and observing their melting profiles (Figure 2B) 



Diwan et al. BMC Genomics 2014, 15:142 
http://www.biomedcentral.com/1471 -21 64/1 5/1 42 



Page 3 of 12 



Random PGR 



primer 




B Temperature Gradient Gel Electrophoresis (TGGE) 

(Strand dissociation point) 
v 



<1 



(ds DMA) 





Pre-Spiddo 
point 

¥ 1 1 1 1 1 1 r Qtttttt 

(initial melting point) 



(15°C) (65°C) 
Temperature Gradient (A6) 



Data normalization and PaSS calculation 




(15°C) (65°C) 
Temperature Gradient (Ad) 



n -i A + 



(15°C) (65°C) 
Temperature Gradient {&&} 



Genome Distance (d G ) = 1 -PaSS 



Figure 2 Overview of the Genome Profiling (GP) method. The entire GP process consists of three steps: (A) Random sampling of DNA 
fragments from genomic DNA (i.e., random PCR), (B) acquisition of sequence information without sequencing (i.e., uTGGE analysis), and 
(C) computer-aided conversion of raw data to genome-intrinsic parameters (spiddos). (A) In Random PCR, primers bind to various regions of 
genomic DNA with mismatch-containing structures under low stringency conditions, leading to the generation of a set of fragments. (B) In 
uTGGE, DNA fragments loaded at the top of a slab gel migrate downward with a characteristic curvature caused by the temperature gradient. 
The pre-spiddo point of a DNA fragment (i.e., initiation of the melting-derived transition from double-stranded to single-stranded DNA) is 
indicated by a red dot. (C) Pre-spiddo points (red dots) are indicated in images a and b for genomes a and b, respectively. Species identification 
dots (spiddos), shown in diagrams a' and b', are obtained by normalizing the coordinates of pre-spiddo points with respect to internal reference 
DNA fragments (white dots). Spiddos thus obtained are used to calculate pattern similarity score (PoSS) or genomic distance (d G = 1 - PaSS). 



[30-32]. In this method, the property, spiddos (species 
identification dots), derived from the DNA sequence 
information [22] plays the pivotal role in identifying a 
genome and enables us to measure the genome dis- 
tance (see Methods). 



GP has been used as a tool for universal species iden- 
tification [21,24,27,28,33] and as an accurate detector 
of mutation [34,35]. In this study, we applied the GP 
method to a new challenge: detection of extremely 
small genomic differences between very closely related 
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cells with the aim of examining within-organism se- 
quence variation. 

Results and discussion 

We used Japanese beech (Fagus crenata) trees to exam- 
ine whether GP was able to reveal if all leaves within a 
single tree had identical genome sequences (Figures 2 
and 3). More specifically, we analyzed sets of species 
identification dots (spiddos), a pivotal GP parameter de- 
rived from genome sequences (Figure 2C), that were ob- 
tained from genome profiles, specified by both mobility 
and melting temperature, both of which are determined 
after calibration and normalization of band patterns by a 
computer using co-migrating internal references (see 
Methods). Although genome profiles (i.e., DNA pat- 
terns generated by (iTGGE analysis) were not always 
reproducible because of experimental fluctuations (i.e., 
environmental temperature, instrumental drift and others), 
spiddos were highly reproducible as a result of a 
normalization process that compensated for experi- 
mental fluctuations (Figure 2). As shown in Figure 4A, 
all leaves on the same Japanese beech branch (e.g., Al-1, 
Al-2, and Al-3, where "Al-2" refers to tree A, branch 7, 
leaf 2) clustered together. This was also the case for the 
genome profiles of leaves on branches A2 and A3, Leaves 
from different branches were found to have different gen- 
ome sequences. Spiddos of branch Al and A2 leaves were 
more similar to one another than to spiddos of leaves on 
branch A3, located furthest from the ground (Figure 3). 
Differences were observed in spiddos between leaves be- 
longing to the same branch, but these differences were the 
level of experimental errors and thus they cannot be said 
to be significant at this moment [22]. These results reveal 
that within statistical significance, leaves from individual 
branches possessed identical genome sequences, but had 
distinctively different sequences from those of different 
branches, a finding not previously reported. This result 
was further confirmed by conducting a similar experiment 
using different Japanese beech individuals. We also ana- 
lyzed another species, Yoshino cherry (Prunus x yedoensis), 
located -800 km from the site of the Japanese beech trees 
for more generalized confirmation (Figure 1). Finally, to 
detect methodological differences, we sequenced a particu- 
lar DNA band obtained from GP (see in Additional file 1: 
Figure SI and for details see Additional file 2). Throughout 
these experiments, we consistently reached the same con- 
clusion: genome sequences within organisms were not 
identical, but instead varied systematically. 

Figure 4B reveals that very similar results were ob- 
tained from the two additional Japanese beech trees. 
Interestingly, the same relationship trend was observed 
among all three trees: spiddos of leaves from uppermost 
branches (A3, B3, and C3) were distinct from spiddos 
of other leaves (Figure 4B). The cluster dendrogram in 



Figure 4B was globally constructed based on the whole 
set of distances (d G ) obtained from all leaf spiddos 
(Additional file 3: Table SI); consequently, the resulting 
logically expected structure— leaves on the same branch 
grouped together and branches on the same tree clus- 
tered together— is most impressive and unexpected, 
demonstrating the effectiveness of this approach. It is 
therefore evident that genomes of leaves on a tree are 
neither completely identical to one another nor ran- 
domly different but, rather, systematically differ depend- 
ing on branch location. 

As shown in Figure 1, similar results were reproducibly 
obtained using the other species, Yoshino cherry. Results 
of cluster analyses of distances (d G ) obtained using spiddos 
data from three independent GP experiments using the 
same samples from five branches (Additional file 4: Table 
S2) are shown in Figure 1A-C; clustering results based on 
an average of the three trials are shown in Figure ID. 
These results of individual experiments (Figure 1A,C) 
show basically the same pattern as those obtained from 
the statistically more reliable averages (Figure ID), indicat- 
ing that this experimental system has a rather low variance 
(in other words, a single experiment can provide a good 
prospect) with only a minor exception: positional ex- 
change of branches 3 and 4 in Figure 1C. The situation 
observed in Figure 4 (Japanese beech) also held true for 
Yoshino cherry, i.e., genome profiles of leaves were not 
identical, but instead differed systematically. In addition, 
genomes of leaves from the uppermost branch (5-7, 5-2, 
and 5-3) were genetically distant from leaves of middle 
branches, indicating a correlation between genomic dis- 
tance and branch location. The same phenomenon was 
thus observed in two different, widely separated species, 
namely, that leaves from the same tree have different gen- 
ome sequences that can be distinguished using GP. 

Our discovery was partially corroborated upon further 
investigation using direct sequencing. As shown in (see 
Additional file 1: Figure SI), leaves from the same branch 
tended to have more closely related sequences, as seen in 
pairs of leaves from the same Japanese beech branches 
(B2-2 and B2-3) and (B3-1 and B3-2) in (see Additional 
file 1: Figure SIB) and from closely located branches of 
Yoshino cherry (B2-1 and B3-1) and (B4-1 and B5-1) in 
(see Additional file 1: Figure S1A). Because of missing data 
caused by generation of artifacts during cloning and se- 
quencing, these results are somewhat equivocal; nonethe- 
less, these data are congruent with the conclusions drawn 
from the GP experiments. With respect to these direct se- 
quencing results, the experimental procedures used, and 
sequencing in general, need to be taken into account. 
DNA fragments generated from the GP experiment were 
collected by excising their bands from polyacrylamide 
gels, the most reliable method for obtaining sequences 
common to both GP and conventional sequencing. 
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Figure 3 An example of raw data used for obtaining genomic distance {d G ). The original data used to obtain Figure 4A (Al-1 to A3-3) are 
displayed here to demonstrate how d G values were obtained. Feature points appearing in the genome profiles (TGGE electrophoretic patterns) of 
two leaves, and a 2 , are indicated by dots. These were processed to provide normalized coordinate data referred to as spiddos (shown in $ ] 
and p 2 ). The computer-processed data (spiddos) from (3] and (3 2 are superimposed so that differences in the two sets of spiddos can be easily 
recognized. To calculate PaSS (defined in Methods), the displacements were summed and divided by the number of spiddos. 



Collected DNA was then subjected to cloning and sequen- 
cing, two procedures that can introduce mutations. Many 
spurious sequences were in fact obtained and discarded, 
including sequences having very low sequence similarity 
to the primary sequence generated from the DNA band, 
and sequences of non-plant origin. Although they were 
within an apparently acceptable range based on sequence 
consistency (i.e., high similarity), the results shown in 



(see Additional file 1: Figure SI) were thus subject to limi- 
tations inherent to the cloning and sequencing process. 
Nevertheless, this illustrates one difficulty encountered 
when using such a clone-isolation- and sequencing- 
based approach to identify mutation frequencies: the 
two mutation types — original mutations and sequencing 
operation-derived mutations (presumably introduced dur- 
ing template preparation, PCR-amplification, sequencing, 
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Figure 4 Clustering of beech tree leaves. Sample labels indicate the tree, branch, and leaf (e.g., A 1-2 corresponds to leaf 2 of branch 7 of tree 
A). (A) Dendrogram resulting from Ward's clustering of genomic distances of Japanese beech tree leaves. Genomic distances are displayed on 
dendrogram branches. (B) Dendrogram obtained from cluster analysis collectively performed on three different Japanese beech trees. Leaves 
belonging to each tree clustered together in a fashion similar to the dendrogram shown in A even in this global clustering. Each spiddos data 
point used to calculate genomic distance represented the average of two trials using the same leaf (Additional file 3: Table S1). (C) One of the 
beech trees sampled in Sapporo, Japan in late May, 201 1. 



and base-calling), cannot be distinguished in the final 
clonal sequencing results. To obtain statistically significant 
results using conventional high-precision sequencing, 
high-volume sequencing of the multiple-million base-pair 
level must be carried out to separate infrequently occurring 
mutations (e.g., < 10" 6 /mutations/base/replication) from 
background noise. In this regard, it should be noted that 
the ability of the GP method to overcome this difficulty 
has been experimentally demonstrated: GP has been used 
successfully for species identification and classification 
[24,25,27,28,36] and in high-sensitivity mutation assays 
[34,35]. 



In this study, we have demonstrated that leaves from 
the same tree do not have exactly identical genome se- 
quences. This conclusion is expected to be applicable to 
any multi-celled organism, as DNA is not perfectly repli- 
cated in any organism, and thus each genome replication 
cycle induces mutations that are usually too infrequent 
to be detected (1(T 6 to 10" 9 mutations/base/replication) 
[18]. In addition, epigenetic methylation of DNA, of 
which degree must be different from cell to cell and may 
have a potential to induce base-substitution during PCR, 
does not effect its PCR amplification [37], which was in- 
dependently confirmed in our study (Table 1 and 
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Table 1 Reagents used for the DNA methylation and 
restriction enzyme cleavage 

Step 1 . Methylation reaction (1 0 \x\) from the protocol of New 
England Biolabs Inc. 



Nuclease free water 

10x Hpall methyltransferase buffer 

SAM (80 uM) 

Genomic DNA (10 ng) 

Hpall methyltransferase 

Step 2. Hpall digestion reaction (50 nl) 

Methylation product (taken from step 1) 

lOx NEBuffer 1 

MgCI 2 (10 mM) 

Hpall restriction enzyme 

Nuclease free water 



6 ul 
1 M l 
0.1 pi 
0.4 Ml 
2.5 Ml 

10 Ml 
4 Ml 
20 Ml 
4 Ml 
12 Ml 



SAM; S-adenosylmethionine, NEBuffer; New England Biolabs buffer. 

Figure 5). Based on the total number of base pairs in the 
DNA bands obtained by random PCR (i.e., roughly 10 
bands, each 1000 bp), we tentatively estimate the GP 
method has a detection sensitivity of 10" 4 mutations/ 
base/replication. More specifically, the total number of 
mutations accumulating over g generations, ft(g), can be 
calculated using the formula: 



pfe) = £(M0 + x(0), 



(i) 



where and y(i) represent replication-dependent and 
repair-dependent mutation rates, respectively. If we ten- 
tatively assume = p c (a constant) and > > y(i) for 
all i, then 



P is) = gVc 



(2) 



This estimate indicates that the GP method cannot de- 
tect mutations occurring at a frequency lower than g • (A c 
(< 10" 4 /base). Consequently, leaf genomes must contain 
a significant number of mutations, equivalent to the sum 
of replication- and repair-caused mutations. This finding 
leads us to consider whether the large number of estimated 
mutations implies that mutation events during replication 
and repair (a type of somatic mutation') have been unex- 
pectedly frequent [39], or if instead there is a large cell gen- 
eration difference between tree branches, as follows: 

If we assume that \a c = 10" 8 in the above context, then 
g, the number of generations, must be 



n c =io- 8 



10 4 



(3) 



Because longitudinally tandem consecutive cells ex- 
pand to the length g'-a, where g' is the number of cell 



generations and a is the unit cell length, we can calcu- 
late the number of cell generations (g) separating two 
branches. If a = 20 um and the branch-to-branch dis- 
tance, B, is 2 m, then 



B 



2 1 -s\ = 10 5 > g 



(2x10" 



10 4 



and thus from Eq. 2, 

g g 



10" 4 

To 5 " 



1(T 



(4) 



(5) 



Based on this tentative calculation, the apparent genomic 
distance observed using the GP method, which has a detec- 
tion limit > 10~ 4 /base, is within a reasonable range. In other 
words, the accumulated point mutations are as a conse- 
quence of the large generational difference between cells. 
Obviously, this conclusion needs to be confirmed by other 
approaches. Our finding regarding this unexpectedly wide 
genome-to-genome distance will surely collect the interest 
in this theme which have been less payed with attention. 

Except for cancer cells, cells within an individual or- 
ganism have been previously believed to possess identi- 
cal genomes. Two brief reports have recently appeared 
suggesting that cells from a single individual might have 
different genomes [3,16], although no hard evidence ex- 
ists nor has systematic research been performed to con- 
firm those observations. Nevertheless, these reports are 
consistent with the findings of our study. 

Conclusions 

The study reported here provided with the first systematic 
analysis of genome sequence differences among cells in 
single individuals using the GP method. As a result leaf 
genome sequences within individual trees were found not 
to be identical, but varied systematically from the bottom 
to the top of the tree. Since this phenomenon was de- 
tected by the GP method that cannot detect the mutation 
of less than 10" 5 /base/replication, a large number of accu- 
mulated mutations must exist between distantly located 
cells in the tree. 

This fact leads to a natural inference that two cells in 
an individual differ in their genome sequences in rela- 
tion to their physical distances. In other words, no two 
cells have completely identical genome sequences. This 
finding and inference will surely have an influence on 
the interpretation of various phenomena including mu- 
tagens, cancer and others. 

Methods 

Leaves of Japanese beech (or Buna) (Fagus crenata) trees 
growing in Sapporo, Japan, and Yoshino cherry (Sakura) 
(Prunus x yedoensis) trees from Saitama, Japan, were 
used in this study. The notation A 1-2 denotes leaf 2 on 
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Figure 5 (See legend on next page.) 
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(See figure on previous page.) 

Figure 5 DNA replication is not affected by DNA methylation. As shown in Panels A, B, and C, the results of three independent tests using 
different portions of yeast genomic DNA (which is naturally unmethylated [38]) provide evidence that methylation does not affect PCR results. In 
these experiments, random PCR was performed using one of the primers (pfm 3 (5'-cy3-dCTGGATAGCGTC), pfm 10 (5'-cy3-dGCGCATO\GACG) 
and pfm 12 (5'-cy3-dAGAACGCGCCTG)) with Taq DNA polymerase. (Random PCR is a variation of PCR employing only a single primer and 
performed at a lower annealing temperature [~26°C], generating primer sequence-independent DNA fragments [31]). Lane 1 is a100-bp size 
marker. Bands indicated by a and (3 (in lane 2 of panels A, B and C) are DNA fragments containing Hpall methylation/restriction site(s), as their 
cleavage resulted in their disappearance from lane 4. The presence of a and (3 bands in lane 2 in panels A, B and C demonstrate that these 
regions could be amplified by random PCR even though they contained a methylation site. 



branch 1 of tree A. Branch numbers were assigned in 
the order in which they appeared, beginning from lower 
(ground) to upper (tree top) levels. 

Genomic DNA preparation 

After washing leaf samples in 10% sodium dodecyl sul- 
fate (SDS), DNA was extracted using the cetyltrimethy- 
lammonium bromide (CTAB) method [40]. Briefly, 100- 
120-mg samples (wet weight) were homogenized with a 
mortar and pestle using liquid nitrogen. One milliliter of 
CTAB solution (200 mM Tris-HCl [pH 9.0], 2% [w/v] 
CTAB, 2% [w/v] polyvinylpyrrolidone, 0.1% [v/v] 2- 
mercaptoethanol, 1.4 M NaCl, and 20 mM ethylenedi- 
aminetetraacetic acid) was immediately added to the 
crushed cells, followed by incubation for 1 h at 65°C. 
After incubation, a 24:1 chloroform-isoamyl alcohol 
mixture was added; the solution was mixed gently and 
then centrifuged for 10 min at 12,000 xg (14,000 rpm). 
This step was repeated twice. An equal volume of pro- 
panol was then added to the supernatant, which was 
centrifuged for 5 min at 16,000 x g (15,000 rpm). In 
most cases, the pellet obtained was washed with 70% 
ethanol, centrifuged, and desiccated using an evaporator. 
Finally, 100 ul of phosphate-buffered saline was added to 
the precipitate to dissolve the pellet. 

GP technology is sufficiently robust such that slight 
impurities of denatured proteins or polysaccharides will 
not interfere. Other plant cell components, such as alka- 
loids and secondary products, can be inhibitory to the 
PCR reaction, however; consequently, DNA samples 
were diluted prior to amplification. 

Genome profiling (GP) 

Genome profiling (GP) uses a set of DNA fragments 
sampled from genomic DNAs, and is composed of three 
fundamental steps: random PCR, micro-temperature 
gradient gel electrophoresis (uTGGE), and data 
normalization by computer processing [22,32] (Figure 2). 
Random PCR can employ arbitrary primers for the PCR 
reaction because of the relaxed nature of primer binding 
to template DNA under sufficiently low temperatures. 
This attribute allows samples of unknown genomic se- 
quence, for which specific primers cannot be designed, 



to be amplified. As a consequence, DNA fragments from 
any genomic DNA can be collected independently of the 
sequence of an oligonucleotide primer used [30,31] 
(Note that a single primer is used for random PCR). 

Random PCR 

Random PCR was performed using primers HUNT (5'-dT 
GCTGCTGCTGC-3 ') and Pfml2 (5 -dAGAACGCGCCT 
G-3'), which were Cy3-labeled at their 5' ends. The reaction 
mixture (25 ul total volume) for random PCR contained 
1 ng template DNA, 100 uM primer DNA, 200 uM dNTPs, 
10 mM Tris-HCl (pH 9.0), 50 mM KC1, 2.5 mM MgCl 2 , 
and 0.02 unit ul" 1 Taq DNA polymerase (Takara Bio Inc., 
Shiga, Japan). During random PCR, contamination by other 
organisms should be carefully avoided. To inactivate any 
contaminating DNAs that could act as a template, the entire 
random PCR solution, without the template DNA, was 
therefore UV-irradiated prior to the reaction. Random PCR 
was carried out using 30 cycles of denaturation (94°C, 30 s), 
annealing (26°C, 1 min), and extension (47°C, 1 min) on a 
C1000 thermal cycler (Bio-Rad, Hercules, CA, USA). The 
second random PCR mixture (50 ul volume) contained 1 ul 
of the first PCR product as template and the same concen- 
trations of constituents used in the original reaction. The re- 
action was performed using 10 cycles of denaturation (94°C, 
30 s), annealing (60°C, 1 min), and extension (74°C, 1 min). 
Only 10 cycles were used to ensure that the reaction was 
terminated before all primer molecules were consumed; this 
was necessary to guarantee that the major PCR products 
were in a double-stranded state and thus suitable for TGGE 
analysis (i.e., so that the melting transition of double- 
stranded DNA to a single-stranded form can be detected). 

laTGGE analysis 

For uTGGE, we used a tiny slab gel (24 x 16 x 1 mm 3 ) 
set on a u-TG temperature-gradient generator (Taitec, 
Iruma, Japan) for electrophoresis [32]. Two internal ref- 
erence DNAs with known melting patterns were co- 
migrated during each electrophoretic run to calibrate 
each genome profile, giving highly reproducible results 
[41]: a 200-bp Refl (a 191-bp fragment from the bac- 
teriophage fd gene VIII, sites 1350-1540, attached to a 
9-bp sequence, CTACGTCTC, at the 3' end; T m = 60°C) 
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and a 900-bp Ref2 taken from a 4361-bp pBR322 frag- 
ment (T m = 61.4°C). Fluorescently-labeled primers MAI 
(5-cy3-dTGCTACGTCTCTTCCGATGCTGTCTTTCG 
CT-3') and MA2 (5 -dCCTTGAATTCTATCGGTTTAT 
CA-3'), Ref6F (5-cy3-dGCCGGCATCACCGGCGCCA 
CAGGTGCGGTTG-3 '), and Ref6R (5-dTAGCGAGG 
TGCCGCCGGCTTCCATTCAGGTC-3 ') were used to 
generate internal references 1 and 2, respectively. The 
gel used was 6% polyacrylamide (19:1 acrylamide:bis) 
containing 500 mM Tris-HCl, 485 mM boric acid, 
20 mM EDTA (pH 8.0), and 8 M urea. Approximately 
2 ug of DNA was loaded onto the gel and subjected to 
electrophoresis with a linear temperature gradient of 15 
to 65°C for 12 minutes at 100 V cm" 1 . After electrophor- 
esis, DNA bands were detected using an FX Molecular 
Imager fluorescence imager (Bio-Rad). 

Computer-aided data analysis 

Genome profiles obtained by the GP method are highly 
informative, but difficult to interpret because of their 
complexity. To overcome this problem, feature points 
called spiddos can be introduced [22]. Spiddos corres- 
pond to points where DNA structural transitions occur, 
such as from double-stranded to single-stranded DNA 
[42]. The coordinates of spiddos are established to be re- 
producibly obtained by an internal reference-mediated 
normalization (i.e., the coordinates of the two reference 
points contained in each GP profile (ref 1 and ref 2, 
Figure 2C) are used to calibrate the coordinates of the 
featuring points for same DNAs) which is sequence- 
and size-dependent. 

Using these normalized coordinates, a pattern similar- 
ity score (PaSS) between two genomes can be measured 
as follows: 




where p t and p t correspond to the normalized positional 
vectors (composed of two elements: mobility ^ and 
temperature 6) for spiddos p t and p t collected from two 
genome profiles, respectively, and i denotes the spiddo 
serial number. In general, 0 < PaSS < 1, and thus, 0 < 
d G < 1. PaSS is equal to one when two spiddo sets match 
perfectly. 

Genomic distance {d G )> a more practical form, is de- 
rived from PaSS as follows: 

d G = 1-PaSS (7) 

If d G is sufficiently small (<< 1), the two genomes of 
interest belong to the same species. 



Cluster analysis of GP data 

To cluster species based on calculated d G values, we 
used Wards clustering method as implemented in the 
software program FreeLighter [25,43,44]. 

Sequencing 

DNA bands of interest were extracted from TGGE micro- 
gels and used as PCR templates in reaction mixtures 
containing 320 uM dNTPs, 100 uM primer pfml2 (5'- 
dAGAACGCGCCTG-3 ), 10 mM Tris-HCl (pH 9.0), 
50 mM KC1, 2.5 mM MgCl 2 , and 0.02 unit ul" 1 Taq DNA 
polymerase (Takara). Reaction conditions consisted of 
30 cycles of denaturation at 94°C for 30 s, annealing at 60° 
C for 60 s, and extension at 74°C for 60 s. The resulting 
random PCR products (DNA) were ligated to pGEM-T 
Easy vectors (Pr omega, Madison, WI, USA) at 4°C over- 
night. Competent cells of E. coli DH5a (Toyobo Co. Ltd., 
Osaka, Japan) were transformed with the ligation product. 
Transformed cells were cultivated on LB agar plates (1% 
tryptone, 0.5% yeast extract, 1% NaCl [pH 7.0], and 1.5% 
agar) supplemented with ampicillin (10 mg in 200 ml of 
LB media), 20 ul X-Gal (50 mg ml" 1 in dimethyforma- 
mide), and 100 ul of 0.1 M IPTG (isopropylthio-p-galacto- 
side). The agar plates were incubated at 37°C for 12-14 h. 
White colonies on the plates were selected with a sterile 
toothpick, transferred to LB broth (1% tryptone, 0.5% 
yeast extract, and 1% NaCl, pH 7.0; 10 mg ampicillin), and 
incubated at 37°C for 12-14 h with shaking at about 
180 rpm. After confirmation of gene insertion, plasmid 
DNA was purified using a Wizard Plus SV Minipreps 
DNA purification system (Promega) and commercially se- 
quenced (Operon Bio-technology, Tokyo, Japan). 

Availability of supporting data 

Data sets supporting the results of this study are in- 
cluded within the article and its additional files. 

Additional files 



Additional file 1: Figure SI. Sequence-based clustering of leaves from (A) 
Yoshino cherry and (B) Japanese beech trees. Only sequence data that could 
be consistently assigned were used. Clustering was performed using Consensus 
Maker v2.0.0 (http//www.hiv.lanl.gov/content/sequence/CONSENSUS/ 
consensus.html). Yoshino cherry tree leaf number designations are arbitrary. 

Additional file 2: DNA consensus sequence data of leaves used for 
analysis were derived using Consensus Maker v2.0.0 (KJ41 1230-KJ41 1277). 

Additional file 3: Table SI. Genome distances (d G ) among genomes of 
Japanese beech tree leaves. Each value is the average of two 
independent experiments. 

Additional file 4: Table S2. Genome distances (d G ) among genomes of 
Yoshino Cherry tree leaves. Each value is the average of three 
independent experiments. 
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GP: Genome profiling; NGS: Next-generation sequencing; uTGGE: Micro- 
temperature gradient gel electrophoresis; Spiddos: Species identification 
dots; PaSS: Pattern similarity score; d G : Genomic distance. 
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